rotation and translation
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- North America > Canada (0.04)
Explicitly disentangling image content from translation and rotation with spatial-VAE
Given an image dataset, we are often interested in finding data generative factors that encode semantic content independently from pose variables such as rotation and translation. However, current disentanglement approaches do not impose any specific structure on the learned latent representations. We propose a method for explicitly disentangling image rotation and translation from other unstructured latent factors in a variational autoencoder (VAE) framework. By formulating the generative model as a function of the spatial coordinate, we make the reconstruction error differentiable with respect to latent translation and rotation parameters. This formulation allows us to train a neural network to perform approximate inference on these latent variables while explicitly constraining them to only represent rotation and translation. We demonstrate that this framework, termed spatial-VAE, effectively learns latent representations that disentangle image rotation and translation from content and improves reconstruction over standard VAEs on several benchmark datasets, including applications to modeling continuous 2-D views of proteins from single particle electron microscopy and galaxies in astronomical images.
Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant VAE
In many imaging modalities, objects of interest can occur in a variety of locations and poses (i.e. are subject to translations and rotations in 2d or 3d), but the location and pose of an object does not change its semantics (i.e. the object's essence). That is, the specific location and rotation of an airplane in satellite imagery, or the 3d rotation of a chair in a natural image, or the rotation of a particle in a cryo-electron micrograph, do not change the intrinsic nature of those objects. Here, we consider the problem of learning semantic representations of objects that are invariant to pose and location in a fully unsupervised manner. We address shortcomings in previous approaches to this problem by introducing TARGET-VAE, a translation and rotation group-equivariant variational autoencoder framework.
Interpreting Representation Quality of DNNs for 3D Point Cloud Processing: Supplementary Materials Wen Shen
This section provides more details about Shapley values in Section 3 of the paper. Efficiency: The overall reward can be allocated to all players in the game, i.e. This work was done when Wen Shen was an intern at Shanghai Jiao Tong University. Quanshi Zhang is the corresponding author. This study was done under the supervision of Dr. Quanshi Efficiency: The overall reward can be decomposed into interactions of different orders, i.e.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Learning Point Cloud Representations with Pose Continuity for Depth-Based Category-Level 6D Object Pose Estimation
Li, Zhujun, Zhang, Shuo, Stamos, Ioannis
Category-level object pose estimation aims to predict the 6D pose and 3D size of objects within given categories. Existing approaches for this task rely solely on 6D poses as supervisory signals without explicitly capturing the intrinsic continuity of poses, leading to inconsistencies in predictions and reduced generalization to unseen poses. T o address this limitation, we propose HRC-Pose, a novel depth-only framework for category-level object pose estimation, which leverages contrastive learning to learn point cloud representations that preserve the continuity of 6D poses. HRC-Pose decouples object pose into rotation and translation components, which are separately encoded and leveraged throughout the network. Specifically, we introduce a contrastive learning strategy for multi-task, multi-category scenarios based on our 6D pose-aware hierarchical ranking scheme, which contrasts point clouds from multiple categories by considering rotational and translational differences as well as categorical information. W e further design pose estimation modules that separately process the learned rotation-aware and translation-aware embeddings. Our experiments demonstrate that HRC-Pose successfully learns continuous feature spaces. Results on REAL275 and CAM-ERA25 benchmarks show that our method consistently outperforms existing depth-only state-of-the-art methods and runs in real-time, demonstrating its effectiveness and potential for real-world applications.
Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant V AE (Supplementary Material)
A.1 Calculating Kullback-Leibler divergence Based on the standard definition for the KL-divergence, we have: KL ( q (z, θ, t, r| y)||p( z, θ, t, r)) = null We generated two datasets of MNIST(N) and MNIST(U), by rotating and translating digits in MNIST. Images in both of the datasets are 50x50 pixels. A.3 Digit-wise rotation correlation, and RMSE of the predicted rotations We created a new dataset using multiple rotated and translated digits from MNIST(U). Some predicted rotations for digits 0, 1, and 8 are off by π from their ground-truth values. We find that the model correctly identifies and reconstructs the objects (Figure 3).
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)